Information flow and optimization in transcriptional control 
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In the simplest view of transcriptional regulation, the expression of a gene is turned on or off by 
changes in the concentration of a transcription factor (TF). We use recent data on noise levels in 
gene expression to show that it should be possible to transmit much more than just one regulatory 
bit. Realizing this optimal information capacity would require that the dynamic range of TF con- 
centrations used by the cell, the input/output relation of the regulatory module, and the noise levels 
of binding and transcription satisfy certain matching relations. This parameter-free prediction is in 
good agreement with recent experiments on the Bicoid/Hunchback system in the early Drosophila 
embryo, and this system achieves ~ 90% of its theoretical maximum information transmission. 



Cells control the expression of genes in part through 
transcription factors, proteins which bind to particular 
sites along the genome and thereby enhance or inhibit 
the transcription of nearby genes (FigHJ. We can think 
of this transcriptional control process as an input /output 
device in which the input is the concentration of tran- 
scription factor and the output is the concentration of 
the gene product. Although this qualitative picture has 
been with us for roughly forty years only recently 
have there been quantitative measurements of in vivo 
input/output relations and of the noise in output level 
when the input is fixed @, 0, 0, 0, 0, 0, 0, 0, EU EH- 
Because these input / output relations have a limited dy- 
namic range, noise limits the "power" of the cell to con- 
trol gene expression levels. In this paper, we quantify 
these limits and derive the strategies that cells should use 
to take maximum advantage of the available power. We 
show that, to make optimal use of its regulatory capac- 
ity, cells must achieve the proper quantitative matching 
among the input/output relation, the noise level, and the 
distribution of transcription factor concentrations used 
during the life of the cell. We test these predictions 
against recent experiments on the Bicoid and Hunch- 
back morphogens in the early Drosophila embryo 



and find that the observed distributions have a nontriv- 
ial structure which is in good agreement with theory, 
with no adjustable parameters. This suggests that, in 
this system at least, cells make nearly optimal use of the 
available regulatory capacity and transmit substantially 
more than the simple on/off bit that might suffice to 
delineate a spatial expression boundary. 

Gene expression levels (g) change in response to 
changes in transcription factor (TF) concentration (c). 
These changes often are summarized by an input /output 
relation g(c) in which the mean expression level is plot- 
ted as a function of TF concentration (FigQJ. The av- 
erage relationship is a smooth function but, because of 
noise, this does not mean that arbitrarily small changes 
in input transcription factor concentration are meaning- 
ful for the cell. The noise in expression levels could even 
be so large that reliable distinctions can only be made 



between (for example) "gene on" at high TF concentra- 
tion and "gene off" at low TF concentration. To explore 
this issue, we need to quantify the number of reliably dis- 
tinguishable regulatory settings of the transcription ap- 
paratus, a task to which Shannon's mutual information 
flil . [Tol ] is ideally suited. While there are many ways to 
associate a scalar measure of correlation or control with 
a joint distribution of input and output signals, Shannon 
proved that mutual information is the only such quantity 
that satisfies certain plausible general requirements, in- 
dependent of the details of the underlying distributions. 
Mutual information has been successfully used to ana- 
lyze noise and coding in neural systems and it is 
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FIG. 1: Transcriptional regulation of gene expression. The 
occupancy of the binding site by transcription factors sets 
the activity of the promoter and hence the amount of pro- 
tein produced. The physics of TF-DNA interaction, tran- 
scription and translation processes determine the conditional 
distribution of expression levels g at fixed TF concentration 
c, P(g\c), shown here as a heat map with red (blue) corre- 
sponding to high (low) probability. The mean input/output 
relation is shown as a thick white line, and the dashed lines 
indicate ± one standard deviation of the noise around this 
mean. Two sample input distributions Ptf(c) (lower left) 
are passed through P(g\c) to yield two corresponding distri- 
butions of outputs, -Pexp(ff) (lower right). 



2 



natural to think that it may be useful for organizing our 
understanding of gene regulation; see also Ref [15| , 

Roughly speaking, the mutual information I(c; g) be- 
tween TF concentration and expression level counts the 
(logarithm of the) number of distinguishable expression 
levels achieved by varying c. If we measure the informa- 
tion in bits, then 



J(c; 



dcP TF (c) / dgP(g\c)log 2 



P(9\c) 



-Poxp(ff) 



, (1) 



where Ptf(c) is the distribution of TF concentrations 
the cell generates in the course of its life, P(g\c) is the 
distribution of expression levels at fixed c, and P Qxp (g) 
is the resulting distribution of expression levels, 



(g) = J dcP(g\c)P TF (c). 



(2) 



The distribution, P(g\c), of expression levels at fixed 
transcription factor concentration describes the physics 
of the regulatory element itself, from the protein/DNA 
interaction, to the rates of protein synthesis and degra- 
dation; this distribution describes both the mean in- 
put/output relation and the noise fluctuations around 
the mean output. The information transmission, or reg- 
ulatory power, of the system is not determined by P(g\c) 
alone, however, but also depends on the distribution, 
Ptf(c), of transcription factor "inputs" that the cell 
uses, as can be seen from Eq ([T]). By adjusting this 
distribution to match the properties of the regulatory 
element, the cell can maximize its regulatory power. 

Matching the distribution of inputs to the (stochas- 
tic) input / output relation of the system is a central con- 
cept in information theory [13j |. and has been applied to 
the problems of coding in the nervous system. For sen- 
sory systems, the distribution of inputs is determined by 
the natural environment, and the neural circuitry can 
adapt, learn or evolve (on different times scales) to ad- 
just its input/output relation. It has been suggested that 
maximizing information transmission is a principle which 
can predict the form of this adaptation 1(| 13, 18, lit ]. 
In transcriptional regulation, by contrast, it seems more 
appropriate to regard the input /output relation as fixed 
and ask how the cell might optimize its regulatory power 
by adjusting the distribution of TF inputs. 

It is difficult to make analytic progress in the general 
calculation of mutual information, but there is a simple 
and plausible approximation. The expression level at a 
fixed TF concentration c has a mean value g(c), which 
we can plot as an input /output relation (Fig [1]) . Let 
us assume that the fluctuations around this mean are 
Gaussian with a variance a g (c) which will itself depend 
on the TF concentration. Formally this means that 
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Further let us assume that the noise level is small. Then 
we can expand all of the relevant integrals as a power 
series in the magnitude of a g : 



I(c;g) 



J dgP cxp (g)\og 2 P C xp(g) 



dgPex P (g) log 2 [2nea 2 g (g)] + • • • ,(4) 



where • • • are terms that vanish as the noise level de- 
creases and P exp (g) is the probability distribution for 
the average levels of expression. We can think of this as 
the distribution that the cell is "trying" to generate, and 
would generate in the absence of noise: 

P cxp (g) = J dcP T F(c)6[g-g(c)} (5) 

-l 



Ptf{c = c*(g)) 



(6) 



c(fl) 



where c*(g) is the TF concentration at which the mean 
expression level is g; similarly, by o~ g (g) we mean o~ g (c) 
evaluated at c = c* (g) . 

We now can ask how the cell should adjust these dis- 
tributions to maximize the information being transmit- 
ted. In the low-noise approximation summarized by Eq 
maximizing 7(c; g) poses a variational problem for 
-fcxp(ff) whose solution has a simple form: 



P exp(ff) 

Z 
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z 



= / dg 



(7) 
(8) 



This result captures the intuition that effective regula- 
tion requires preferential use of signals that have high 
reliability or low variance — -P c * xp (<?) is large where a g is 
small. The actual information transmitted for this op- 
timal distribution can be found by substituting f c * xp (g) 
into Eq with the result I pt(c; g) = log 2 (Z/\/27re) . 

Although wc initially formulated our problem as one 
of optimizing the distribution of inputs, the low noise ap- 
proximation yields a result [Eq ([7])] which connects the 
optimal distribution of output expression levels to the 
variances of the same quantities, sampled across the life 
of a cell as it responds to natural variations in its environ- 
ment. To the extent that the small noise approximation 
is applicable, data on the variance vs mean expression 
thus suffice to calculate the maximum information ca- 
pacity; details of the input/output relation, such as its 
degree of cooperativity, do not matter except insofar as 
they leave their signature on the noise. 

Recent experiments provide the data for an applica- 
tion of these ideas. Elowitz and coworkers have measured 
gene expression noise in a synthetic system, placing flu- 
orescent proteins under the control of a lac-repressible 
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promoter in E. coli Q. Varying the concentration of an 
inducer, they determined the intrinsic variance of expres- 
sion levels across a bacterial population as a function of 
mean expression level. Their results can be summarized 
as Cg{g) — ag + bg 2 , where the expression level g is nor- 
malized to have a maximum mean value of 1, and the 
constants are a = 5 — 7 x 10~ 4 and b = 3 — 10 x 1CT 3 . 
Across most of the dynamic range (g ^S> 0.03), the 
small noise approximation should be valid and, as dis- 
cussed above, knowledge of cr g (g) alone suffices to com- 
pute the optimal information transmission. We find 
7 op t(c; g) ~ 3.5 bits: rather than being limited to on/off 
switching, these transcriptional control systems could in 
principle specify 2 /opt ~ 10—12 distinguishable levels 
of gene expression! It is not clear whether this capac- 
ity, measured in an engineered system, is available to 
or used by E. coli in its natural environment. The cal- 
culation does demonstrate, however, that optimal infor- 
mation transmission values derived from real data are 
more than one bit, but perhaps small enough to provide 
significant constraints on regulatory function. 

When the noise is not small, no simple analytic ap- 
proaches are available. On the other hand, so long as 
P(g\c) is known explicitly, our problem is equivalent 
to one well-studied in communication theory, and effi- 
cient numerical algorithms are available for finding the 
input distribution Ptf{c) that optimizes the informa- 
tion I(c;g) defined in Eq JT]) [2(J. In general we must 
extract P(g\c) from experiment and, to deal with finite 
data, we will assume that it has the Gaussian form of Eq 
([3|). P(g\c) then is completely determined by measuring 
just two functions of c: the mean input/output relation 
g(c) and the output variance o~ 2 {c). The central point 
is that, in the general case, solving the information op- 
timization problem requires only empirical data on the 
input/output relation and noise. 

The initial events of pattern formation in the em- 
bryo of the fruit fly Drosophila provide a promising 
testing ground for the optimization principle proposed 
here. These events depend on the establishment of 
spatial gradients in the concentration of various mor- 
phogen molecules, most of which are transcription fac- 
tors [U H2|. To be specific, consider the response 
of the hunchback (Hb) gene to the maternally estab- 
lish ed g radient of the transcription factor Bicoid (Bed) 
[2J, l25|, [26|. A recent experiment reports the Bed 
and Hb concentrations in thousands of individual nu- 
clei of the Drosophila embryo, using fluorescent antibody 
staining [H| ; the results can be summarized by the mean 
input /output relation and noise level shown in Fig [2l 
These data can be understood in some detail on the ba- 
sis of a simple physical model [27| , but here we use the 
experimental observations directly to make phenomeno- 
logical predictions about maximum available regulatory 
power and optimal distribution of expression levels. 




-0.5 0.5 
log(c/K.) 



FIG. 2: The Bcd/Hb input/output relationship in the 
Drosophila melanogaster syncitium at early nuclear cycle 14 
[ill ], (a) Each point marks the Hb (g) and Bed (c) concentra- 
tion in a single nucleus, as inferred from immunofluorescent 
staining; data are from ~ 11 • 10 3 individual nuclei across 9 
embryos. Hb expression levels g are normalized so that the 
maximum and minimum mean expression levels are 1 and 
respectively; small errors in the estimate of background 
fluorescence result in some apparent expression values being 
slightly negative. Bed concentrations c are normalized by Kd, 
the concentration of Bed at which the mean Hb expression 
level is half maximal. For details of normalization across em- 
bryos, see [ill ]. Solid red line is a sigmoidal fit to the mean g 
at each value of c, and error bars are ± one s.e.m.. (b) Noise 
in Hb as a function of Bed concentration; error bars are ± 
one s.d. across embryos, and the curve is a fit from Ref [27I ]. 



Given the measurements of the mean input /output re- 
lation g(c) and noise a g (c) shown in Fig[2j we can calcu- 
late the maximum mutual information between Bed and 
Hb concentrations by following the steps outlined above; 
we find I opt (c;g) = 1.7bits. To place this result in con- 
text, we imagine a system that has the same mean in- 
put / output relation, but the noise variance is scaled by a 
factor F, and ask how the optimal information transmis- 
sion depends on F. This is not just a mathematical trick: 
for most physical sources of noise, the relative variance is 
inversely proportional to the number of molecules, and so 
scaling the expression noise variance down by a factor of 
ten is equivalent to assuming that all relevant molecules 
are present in ten times as many copies. We see in Fig 
[3] that there is a large regime in which the regulatory 
power is well approximated by the small noise approxi- 
mation. In the opposite extreme, at large noise levels, we 
expect that there are (at best!) only two distinguishable 
states of high and low expression, so that our problem 
approaches the asymmetric binary channel [28| . The ex- 
act result interpolates smoothly between these two lim- 
iting cases with the real system (F — 1) lying closer to 
the small noise limit, but deviating from it significantly. 

In the embryo, maximizing information flow from 
transcription factor to target gene has a very special 
meaning. Cells acquire "positional information," and 
thus can take actions which are appropriate to their po- 
sition in the embryo, by responding to the local con- 
centration of morphogen molecules 2l| . In the original 
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FIG. 3: Optimal information transmission for the Bcd/Hb 
system as a function of the noise variance rescaling factor F. 
1/F is approximately equal to the factor by which the number 
of input and output signaling molecules has to be increased 
for the corresponding gain in capacity. Dashed and dotted 
curves show the solutions in the small-noise and large-noise 
approximations, respectively. The real system, F = 1, lies in 
an intermediate region where neither the small-noise nor the 
large-noise approximation are valid. Measured information 
^data(c;<?) shown in red (errorbar is s.d. over 9 embryos). 



discussions, "information" was used colloquially. But in 



the simplest picture of Drosophila development [22], 
information in the technical sense really does flow from 
physical position along the anterior-posterior axis to the 
concentration of the primary maternal gradients (such 
as Bed) to the expression level of the gap genes (such as 
Hb). Maximizing the mutual information between Bed 
and Hb thus maximizes the positional information that 
can be carried by the Hb expression level. 

More generally, rather than thinking of each gap gene 
as having its own spatial profile, we can think of the 
expression levels of all the gap genes together as a code 
for the position of each cell. In the same way that the 
four bases (two bits) of DNA must code in triplets in 
order to represent arbitrary sequences of 20 amino acids, 
we can ask how many gap genes would be required to 
encode a unique position in the N IOWS ~ 100 rows of 
nuclei along the anterior-posterior axis. If the regulation 
of Hb by Bed is typical of what happens at this level 
of the developmental cascade, then each letter of the 
code is limited to less than two bits (I pt = 1-7 bits) of 
precision; since log 2 (A f rows )// op t = 3.9, the code would 
need to have at least four letters. It is interesting, then, 
to note that there are four known gap genes — hunchback, 
kriippel, giant and knirps (29l | — which provide the initial 
readout of the maternal anterior-posterior gradients. 

Instead of plotting Hunchback expression levels vs ei- 
ther position or Bed concentration, we can ask about 



the distribution of expression levels seen across all nuclei, 
-fexp(ff), as shown in Fig [U The distribution is bimodal, 
so that large numbers of nuclei have near zero or near 
maximal Hb, consistent with the idea that there is an ex- 
pression boundary — cells in the anterior of the emrbyo 
have Hb "on" and cells in the posterior have Hb "off." 
But intermediate levels of Hunchback expression also oc- 
cur with nonzero probability, and the overall distribution 
is quite smooth. We can compare this experimentally 
measured distribution with the distribution predicted if 
the system maximizes information flow, and we see from 
Fig2]that the agreement is quite good. The optimal dis- 
tribution reproduces the bimodality of the real system, 
hinting in the direction of a simple on/off switch, but 
also correctly predicts that the system makes use of in- 
termediate expression levels. From the data we can also 
compute directly the mutual information between Bed 
and Hb levels, and we find Jdata(c; g) = 1.5 ± 0.15 bit, or 
~ 90% (0.88 ± 0.09) of the theoretical maximum. 

The agreement between the predicted and observed 
distributions of Hunchback expression levels is encour- 
aging. We note, however, some caveats. Bicoid has mul- 
tiple targets and many of these genes have multiple in- 
puts [13], so to fully optimize information flow we need 
to think about a more complex problem than the single 
input, single output system considered here. Measure- 
ment of the distribution of expression levels requires a 
fair sampling of all the nuclei in the embryo, and this was 
not the intent of the experiments of [H} . Similarly, the 
theoretical predictions depend somewhat on the behav- 
ior of the input /output relation and noise at low expres- 
sion levels, which are difficult to characterize experimen- 
tally, as well as the (possible) deviations from Gaussian 
noise. A complete test of our theoretical predictions will 
thus require a new generation of experiments. 




FIG. 4: The measured (black) and optimal (red) distributions 
of Hunchback expression levels. The measured distribution 
is estimated from data of Ref [ill ], by making a histogram 
of the g values for each data point in Fig [2] The optimal 
solution corresponds to the capacity of I op t(c;g) = 1.7 bits. 
The same plot is shown on logarithmic scale in the inset. 
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In summary, the functionality of a transcriptional reg- 
ulatory element is determined by a combination of its 
input/output relation, the noise level, and the dynamic 
range of transcription factor concentrations used by the 
cell. In parallel to discussions of neural coding [17], , 
we have suggested that organisms can make maximal 
use of the available regulatory power by achieving con- 
sistency among these three different ingredients; in par- 
ticular, if we view the input/output relation and noise 
level as fixed, then the distribution of transcription fac- 
tor concentrations or expression levels is predicted by 
the optimization principle. Although many aspects of 
transcriptional regulation are well studied, especially in 
unicellular organisms, these distributions of protein con- 
centrations have not been investigated systematically. In 
embryonic development, by contrast, the distributions of 
expression levels can literally be read out from the spatial 
gradients in morphogen concentration. We have focused 
on the simplest possible picture, in which a single input 
transcription factor regulates a single target gene, but 
nonetheless find encouraging agreement between the pre- 
dictions of our optimization principle and the observed 
distribution of the Hunchback morphogen in Drosophila. 
We emphasize that our prediction is not the result of a 
model with many parameters; instead we have a theo- 
retical principle for what the system ought to do so as 
to maximize its performance, and no free parameters. 
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